HAT4RD: Hierarchical Adversarial Training for Rumor Detection in Social Media

With the development of social media, social communication has changed. While this facilitates people’s communication and access to information, it also provides an ideal platform for spreading rumors. In normal or critical situations, rumors can affect people’s judgment and even endanger social security. However, natural language is high-dimensional and sparse, and the same rumor may be expressed in hundreds of ways on social media. As such, the robustness and generalization of the current rumor detection model are in question. We proposed a novel hierarchical adversarial training method for rumor detection (HAT4RD) on social media. Specifically, HAT4RD is based on gradient ascent by adding adversarial perturbations to the embedding layers of post-level and event-level modules to deceive the detector. At the same time, the detector uses stochastic gradient descent to minimize the adversarial risk to learn a more robust model. In this way, the post-level and event-level sample spaces are enhanced, and we verified the robustness of our model under a variety of adversarial attacks. Moreover, visual experiments indicate that the proposed model drifts into an area with a flat loss landscape, thereby, leading to better generalization. We evaluate our proposed method on three public rumor datasets from two commonly used social platforms (Twitter and Weibo). Our experimental results demonstrate that our model achieved better results compared with the state-of-the-art methods.


Introduction
Today, social media is a popular news source for many people. However, without automatic rumor-detection systems, social media can be a breeding ground for rumors. Rumors can seriously affect people's lives [1]. For instance, during the early outbreak of the current COVID-19 pandemic, rumors about a national lockdown in the United States fueled panic buying in groceries and toilet papers, disrupting the supply chain, exacerbating the demand-supply gap and worsening the issue of food insecurity among the socioeconomically disadvantaged and other vulnerable populations [2]. Setting up automatic rumor detection is therefore essential.
Automatic rumor detection is extremely challenging, and the greatest difficulty lies in spotting camouflaged rumors. As the saying goes, "a rumorhas a hundred mouths"; these words indicate that the ways rumors are expressed constantly change as they spread. Some malicious rumormongers may deliberately modify rumor text information to escape manual detection [3]. Variability and disguise are the main characteristics of rumors, which means that a robust automatic rumor detection model is necessary. Unfortunately, most current rumor detection models are not robust enough to spot the various changes and disguises during the rumor propagation process.
As shown in Figure 1, we simulated the constantly changing process of rumors during their propagation and found that the general deep-learning model was too sensitive to sentence changes and disguise. A BERT-base [4] model trained on the rumor dataset PHEME [5] had a prediction confidence of 0.85 for the rumor "Police say shots fired at 3 #ottawa sites National War Memorial, Parliament Hill, and now Rideau shopping centre"; however, when the input is changed to "According to the government authority report: The shootings took place at three #ottawa locations the National War Memorial parliament Hill and now the Rideau shopping centre", the model's prediction confidence decreased from 0.85 to 0.47. However, the main meaning and label of the input rumor text did not change but the model prediction was incorrect. This result shows that the robustness and generalization of a traditional rumor detection model are poor, and the changes of a few words while the meaning of the sentence remains the same may cause significant changes in the prediction results. To alleviate that problem, we designed a novel rumor detection model called HAT4RD to enhance the generalization ability and robustness of an automatic rumor detection model. Our model detects rumors based on an event, which includes a source post and a certain number of replies. To make full use of the tweet object information and obtain a high-level representation, we took a hierarchical architecture as the skeleton of our model.
To enhance the robustness of our model, adversarial training is included in our model. Using more adversarial data to train the model can enhance the robustness and generalization of the model. However, natural language text space is sparse, and it is impossible to exhaust all possible changes manually to train a robust model. Thus, we perturb the sample space of post-level and event-level, respectively, to comprehensively improve the robustness of the model against changes in the text. The main contributions of this paper can be summarized as follows: • We first propose a hierarchical adversarial training method that encourages the model to provide robust predictions under the perturbed post-level and event-level embedding spaces. • We evaluate the proposed model HAT4RD on three real-world datasets. The experimental results demonstrate that our model outperforms state-of-the-art models. • We prove through experiments that the proposed hierarchical adversarial training method can enhance the robustness and generalization of the model and prevent the model from being deceived by disguised rumors.

Rumor Detection
With the development of artificial intelligence, existing automated rumor detection methods are mainly based on deep neural networks. MA et al. [6] were the first to use a deep learning network, an RNN (Recurrent Neural Network)-based model, for automatic misinformation detection. Chen et al. [7], Yu et al. [8] proposed an attention mechanism into an RNN or CNN (Convolutional Neural Network) model to process a certain number of sequential posts for debunking rumors. Ajao et al. [9] proposed a framework combining CNN and LSTM (Long Short-Term Memory) to classify rumors.
Shu et al. [10] delved into an explainable rumor detection model by using both news content and user comments. Guo et al. [11], Sujana et al. [12] detected rumors by creating a hierarchical neural network to obtain higher-level textual information representations. Yang et al. [13] proposed a rumor detection model that can handle both text and images. Ruchansky et al. [14] analyzed articles and extracted user characteristics to debunk rumors. Ma et al. [15] constructed a recursive neural network to handle conversational structure. Their model was presented as a bottom-up and top-down propagation tree-structured neural network.
Li et al. [16,17] used a variable-structure graph neural network to simulate rumor propagation and obtain more precise information representations in the rumor detection task. Ni et al. [1] used multi-view attention networks to simultaneously capture clue words in the rumor text and suspicious users in the propagation structure. Gumaei et al. [18] proposed an extreme gradient boosting (XGBoost) classifier for rumor detection of Arabic tweets. Li et al. [19] combined objective facts and subjective views for an evidence-based rumor detection. No rumor detection model currently takes adversarial robustness into account.

Adversarial Training
Adversarial training is an important method to enhance the robustness of neural networks. Szegedy et al. [20] first proposed the theory of adversarial training by adding small generated perturbations on input images. The perturbed image pixels were later named as adversarial examples. Goodfellow et al. [21] proposed a fast adversarial example generation approach to attempt to obtain the perturbation value that maximizes adversarial loss. Jia and Liang [22] were the first to adopt adversarial example generation for natural language processing tasks.
Zhao et al. [23] found that when adopting the gradient-based adversarial training method on natural language processing tasks, the generated adversarial examples were invalid characters or word sequences. Gong et al. [24] utilized word vectors as the input for deep-learning models; however, this also generated words that could not be matched with any words in the word embedding space. Ni et al. [25] proposed a random masked weight adversarial training method to improve generalization of neural networks. However, thus far there is no adversarial training method designed for rumor-specific hierarchical structures.

Problem Definition
We define false information that is socially inconsistent with facts to be rumors. Furthermore, we define the task of rumor detection as determining whether it is a rumor based on the relevant information (such as the text content, comments and propagation patterns) of microblog posted on social media platforms. We treat the original post and its reply posts together as an event (see Figure 2 for a real-world example of an event) for rumor detection. A whole event as the final decision-making unit contains a wealth of internal logic and user stance information.
Multiple events in the dataset are defined as D = {E 1 , E 2 , . . . , E |e| }. An event consists of a source post and several reply posts, E j = {P s , P 1 , P 2 , . . . , P |p| }. It should be noted that different events are composed of different numbers of posts, and a post is composed of different words, meaning our model needs to be able to process variable-length sequence information with a hierarchical structure. The event-level classifier can perform learning via labeled event data, that is, E j = {P s , P 1 , P 2 , . . . , P |p| } → y j . In addition, because an event contains multiple posts, we make the posts within the same event share labels. The post-level classifier P n = {x 1 , x 2 , . . . , x |x| } → y n can, therefore, be established.

Preliminaries
Rumors in social media have a hierarchical structure of post-level and event-level. In response to this special data structure, we built the HAT4RD model based on the hierarchical BiLSTM (Bi-directional Long Short-Term Memory), which can be divided into the post-level module and event-level module, as shown in Figure 3. Hierarchical Adversarial Training (HAT) is a novel adversarial training method based on the hierarchical structure model. The overall hierarchical adversarial training procedure is shown in Algorithm 1.
Taking the text of all posts under the event as input, we calculate the embedding of each word by Glove [26] word vectors to obtain the input of post-level BiLSTM. The formula is as follows: where x i is the pre-trained word vector, I p is the input of post-level BiLSTM, and all the vectors with the posts as the unit pass through the post-level BiLSTM layer in proper order. For each time point t, the formula is as follows: The cell state h p t of the uppermost LSTMp at the last time point is used as the result of the post encoding. Due to the use of the bidirectional structure, the final state of both directions is joint, and an event can be represented by a matrix in which each column is a vector representing a post. The formula is as follows: where h p s is the embedding of the source post. h p i is the embedding of a reply post, O p is the output of post-level BiLSTM, and I e is the input of event-level BiLSTM. The formula is as follows: For the next module, the event-level BiLSTM encoding process is similar to post-level BiLSTM. The difference can be seen in the input data unit; post-level BiLSTM uses a post vector composed of word vectors, while event-level BiLSTM uses an event vector composed of post vectors. The formula is as follows: In the rumor detection task, the state h e t of the event-level BiLSTM, the last layer at the last time point can be understood as a comprehensive representation of all posts. Based on the principle of multi-task learning, rumor post classification and rumor event classification are highly related, and the parameters of the post-level module are shared in the two tasks. A post-level auxiliary classifier and an event-level primary classifier were therefore included in the hierarchical model. The post-level auxiliary classifier is mainly for accelerating training and preventing "vanishing gradient". Two classifiers were used to obtain post-level prediction results and event-level prediction results. The formula is as follows:ŷ whereŷ p andŷ e are the post and event classification results, respectively; W p and W e are the weights of the fully connected layers; and b p and b e are the biases. The goal of each training process is to minimize the standard deviation between the predicted and output values using the following loss function: where L p and L e are the post-level loss and event-level loss, respectively. α is the loss coefficient weight to control L p and L e . L t is the total loss of the entire rumor detection model used to update the parameters. y is the real label;ŷ r andŷ n are the two labels predicted by the model: rumor and non-rumor. The gradient of the model was calculated according to Loss L t . The formula is as follows: for (x, y) ∈ X do 3: Forward-propagation calculation Loss: Backward-propagation calculation gradient: 8: Compute hierarchical adversarial perturbation: 10: r p ← p · g p /||g p || 2 ; r e ← e · g e /||g e || 2 11: Forward-Backward-propagation calculation adversarial gradient: 12: 13: g e adv ← ∇ θ L e e adv (θ, x e + r e , y e ) 14: Update parameter: 15: θ ← θ − τ(g + g p adv + g e adv ) 16: end for 17: end for 18: Output: θ

Hierarchical Adversarial Training
The above is a forward propagation under standard training of the model. To enhance the robustness of our model, a hierarchical adversarial training method is adopted. This adversarial optimization process was expressed with the following Min-Max formula: [L t (θ, x p + δ p , (y p , y e )) + L e (θ, x e + δ e , y e )]} (12) where δ p and δ e are the perturbations of the post-level input x p and event-level input x e under maximization of the internal risk. We, respectively, estimated these values by linearizing ∇ x p L t (θ, x p , (y p , y e )) and ∇ x e L e (θ, x e , y e ) around x p and x e . Using the ∇ x p L t (θ, x p , (y p , y e )) and ∇ x e L e (θ, x e , y e ) linear approximation in Equations (13) and (14) and the L2 norm constraint, the resulting adversarial perturbations are: ) ||∇ x p L t (θ, x p , (y p , y e ))|| 2 (13) δ e = e · ∇ x e L e (θ, x e , y e ) ||∇ x e L e (θ, x e , y e )|| 2 (14) where p and e are the perturbation coefficients. Note that the value of the perturbation δ p is calculated based on the back-propagation of the total Loss instead L t of L p , because the addition of the perturbation δ p makes L p and L e increase at the same time.

Post-Level Adversarial Training
After a normal forward and backward propagation, δ p and δ e were calculated according to the gradient. Using post-level adversarial training, we added word-level perturbation to the word vector to obtain the input of post-level BiLSTM, and the formula is as follows: where I p adv is the adversarial input of post-level BiLSTM, and δ p n is the post-level perturbation added to the word vector x n . All the vectors with the posts as the unit then pass through the post-level BiLSTM layer in proper order. For each time point t, the formula is as follows: h The adversarial cell state h

Event-Level Adversarial Training
We next performed event-level adversarial training and repeated the process of Equations (1)-(3) to obtain the post vector. Event-level perturbation was then added to the post vector to obtain the adversarial input of event-level BiLSTM, and the formula is as follows: In the same way, input I e adv into the event-level BiLSTM to obtain the final event representation vector h e adv t , replace h e t with h e adv t and calculate the adversarial loss L e e adv of event-level perturbation through Equations (6)- (9). Finally, the post-level adversarial gradient g e adv is calculated based on backpropagation. The formula is as follows: g e adv = ∇ θ L e e adv (θ, x e + δ e , y e ) Finally, the gradient is calculated by the standard training; the gradient calculated by the post-level adversarial training and the gradient calculated by the event-level adversarial training were used to update the model parameters. The parameter update process is expressed as: where τ is the learning rate.

Datasets
Three well-known public rumor datasets, PHEME 2017, PHEME 2018 [5] and WEIBO [6], were used to evaluate our method HAT4RD. Among them, the original data of PHEME 2017 and PHEME 2018 are from the Twitter social platform, and the language is English; the original data of WEIBO is from the Sina Weibo social platform, and the language is Simplified Chinese. In these three datasets, each event is composed of a source post and several reply posts, The statistical details of these three datasets are shown in Table 1.
"Users" represents the number of users in the datasets; "Posts" represents the number of posts in the datasets; "Event" represents the number of events in the datasets (that is, the number of source posts); Avg words/post" represents the average number of words contained in a post; "Avg posts/event" represents the average number of posts contained in an event; "Rumor" represents the number of rumors in the datasets; "Non-rumor" represents the number of non-rumors in the datasets; and "Balance degree" represents the percentage of rumors in the datasets.

Evaluation Metrics
For a fair comparison, we adopted the same evaluation metrics used in previous work [19]. Therefore, the Accuracy, Precision Recall and F1-measure (F1) were adopted for evaluation, which is described in the following equations: where TP are the true positive, TN are the true negative, FP are the false positive and FN are the false negative predictions.

Experimental Settings
Following the work of [19], the datasets were split for our experiment: 80% for training, 10% for validation and 10% for testing. We trained all the models by employing the derivative of the loss function through backpropagation and used the Adam optimizer [27] to update the parameters. From post text to its embedding, we used Glove's [26] pre-trained 300-dim word vector.
For the hyperparameters, the maximum value of vocabulary was 80,000; the batch size was 64, the dropout rate was 5, the BiLSTM hidden size unit was 512, the loss coefficient weight α was 0.1, the learning rate was 0.0001, and the perturbation coefficients p and e were 1.0 and 0.3. Our proposed model was finally trained for 100 epochs with early stopping. In addition, all experiments were run under the following hardware environment: CPU: Intel(R) Core(TM) i7-8700 CPU@3.20GHz, GPU: GeForce RTX 2080, 10G.

Performance Comparison
Our HAT4RD model was compared with other well-known rumor detection models to evaluate our model's rumor debunking performance.
• SVM-BOW: a rumor detection naive baseline, which is an SVM that uses bag-of-words for word representation [15]. • TextCNN: a rumor detection naive baseline based on deep convolutional neural networks [28]. • BiLSTM: a RNN-based bidirectional model that detects rumor by considering the bidirectional information [29]. • BERT: a well-known pre-trained language model. We fine-tuned a BERT-base to detect rumors [4]. • CSI: a state-of-the-art model detecting rumor by scoring users based on their behavior [14]. • CRNN: a hybrid model that combines recurrent neural network and convolutional neural network to detect rumors [30]. • RDM: a rumor detection model that integrates reinforcement learning and deep learning for early rumor detection [31]. • CSRD: a rumor detection model that classifies rumors by simulating comments' conversation structure using GraphSAGE [16]. • EHCS-Con: a model exploited the user's homogeneity by using the node2vec mechanism encoding user's follow-followers relationship for rumor detection [17]. • LOSIRD: a state-of-the-art rumor detection model that leverages objective facts and subjective views for interpretable rumor detection [19].

Main Experiment Results
The results of different rumor detection models are compared in Table 2; the HAT4RD clearly performed the best in terms of rumor detection compared to the other methods based on the three datasets with 92.5% accuracy on PHEME 2017, 93.7% on PHEME 2018 and 94.8% on WEIBO. In addition, the precision, recall and F1 were all higher than 91% in the HAT4RD model. Our HAT4RD improved on the F1 value of the SOTA model by about 1.5% on the dataset WEIBO. These results demonstrate the effectiveness of the hierarchical structure model and hierarchical adversarial training in rumor detection. However, the SVM-BOW result is poor because the traditional statistical machine-learning method could not handle this complicated task.
The results of the CNN, BiLSTM, BERT and RDM models were poorer than ours due to their insufficient information extraction capabilities. The models are based on postprocessing information and cannot obtain a high-level representation from the hierarchy. Compared to other models, our HAT4RD model has a hierarchical structure and performs different levels of adversarial training. This enhances both the post-level and event-level sample space and improves the robustness and generalization of the rumor detection model.

Ablation Analysis
To evaluate the effectiveness of every component of the proposedHAT4RD, we removed each one of them from the entire model for comparison. "ALL" denotes the entire model HAT4RD with all components, including post-level adversarial training (PA), eventlevel adversarial training (EA), the post-level auxiliary classifier (PC) and event-level primary classifier (EC). After the removing, we obtained the sub-models "-PA", "-EA", "-PC" and "-EC", respectively. "-PA-PC" means that both the post-level adversarial training and auxiliary classifier were removed. "-PA-EA" denotes the reduced HAT4RD without both post-level adversarial training and event-level adversarial training. The results are shown in Figure 4. It can be observed that every component plays a significant role in improving the performance of HAT4RD. HAT4RD outperforms ALL-PA and ALL-EA, which shows that the post-level adversarial training and event-level adversarial training are indeed helpful in rumor detection. Both ALL-PA and ALL-EA were better than ALL-PA-EA, which shows that hierarchical adversarial training was more efficient than single-level adversarial training. The performance of ALL-PC was lower than that of HAT4RD, proving that the post-level auxiliary classifier contributes to the learning and convergence of the model.

Early Rumor Detection
Our model's performance in early rumor detection was evaluated. To simulate the early stage rumor detection scenarios in the real world, nine different size test sets from PHEME 2017, PHEME 2018 and WEIBO were created. Each test set contained a certain number of posts, ranging from 5 to 45. We found that HAT4RD could detect rumors with an approximate 91% accuracy rate with only five posts as illustrated in Figure 5. Compared to the other models, our model uses hierarchical adversarial training and continuously generates optimal adversarial samples to join the training. It, therefore, has good generalization despite limited information.

Robustness Analysis
We used OpenAttack (https://github.com/thunlp/OpenAttack (accessed on 8 May 2022)) [32] to conduct a variety of adversarial attacks on the models and compared the robustness of various recent models. FSGM draws from [21], which is a gradient-based adversarial attack method. HotFlip [33] uses gradient-based word or character substitution to attack. PWWS [34] uses a greedy word substitution order determined by the word saliency and weighted by the classification probability. As shown in Tables 3 and 4, our model can maintain the minimum performance degradation under the three adversarial attacks compared to other baseline models. In particular for gradient-based attacks, the robustness of our model is clear. Under the attack of FSGM, the performance of our model only dropped by about 10%. Under the attacks of HotFlip and PWWS, our model HAT4RD was also significantly more robust than other models. To further visually analyze the effectiveness of the hierarchical adversarial training method, we drew the high-dimensional non-convex loss function with a visualization method (https://github.com/tomgoldstein/loss-landscape (accessed on 8 May 2022)) proposed by [35]. We visualize the loss landscapes around the minima of the empirical risk selected by standard and hierarchical adversarial training with the same model structure.
The 2D and 3D views are plotted in Figure 6. We defined two direction vectors, d x and d y with the same dimensions as θ, drawn from a Gaussian distribution with zero mean and a scale of the same order of magnitude as the variance of layer weights. We then chose a center point θ * and added a linear combination of α and β to obtain a loss that is a function of the contribution of the two random direction vectors.
The results show that the hierarchical adversarial training method indeed selects flatter loss landscapes by dynamically generating post-level perturbation and event-level perturbation. Having a flatter loss function indicates that the model is more robust in input features and can prevent the model from overfitting. Empirically, many studies have shown that a flatter loss landscape usually means better generalization [36][37][38].

Conclusions
Herein, we proposed a new hierarchical adversarial training for rumor detection that considers the camouflages and variability of rumors from an adversarial perspective. Dynamically generating perturbations on the post-level and event-level embedding vectors enhanced the model's robustness and generalization.
The evaluations of three real-world rumor detection datasets on social media showed that our HAT4RD model outperformed the state-of-the-art methods. Numerically, our proposed HAT4RD was 1.1%, 1.1% and 1.5% higher compared with the F1 of the state-ofthe-art model LOSIRD on the three public rumor detection datasets, respectively. The early rumor detection performance of our model also outperformed the other models.
We examined the contribution of each part to the model performance through ablation experiments. Moreover, visual experiments proved that the hierarchical adversarial training method we proposed can optimize the model for a flatter loss landscape. Our HAT4RD model is general and can be applied to data on any topic, as long as the data is posted on social media (e.g., Twitter and Weibo). The ability of our model depends on the training dataset. We only need to add the corresponding data to the model training to detect rumors of different topics.

Future Work
Robustness and generalization are the focus of rumor detection. In the future, we can integrate features, such as text and images for multi-modal adversarial training, to further enhance the model. In addition, for the unique structure of posts and events, we propose that graph neural networks will also be a good research direction, and graph neural networks can be combined with adversarial training to obtain graph adversarial training. We think this will be an interesting research direction. As rumor data collection and labeling are complicated and time-consuming, the recently popular prompt learning based on pre-trained language models for few-shot rumor detection is also worth studying.

Limitations
Finally, our model currently has certain limitations. Since our model includes hierarchical adversarial training, the training time is longer than the general model. Moreover, although our hierarchical adversarial training improves the robustness of the model, our model still has room for improvement due to the diversity of rumors and the sparsity of natural language.